Using FileIO and EncryptionManager for MR InputFiles#985
Conversation
| out.writeInt(data.length); | ||
| out.write(data); | ||
|
|
||
| byte[] ioData = SerializationUtil.serializeToBytes(io); |
There was a problem hiding this comment.
@rdsr, I just noticed this. Do we need custom Serializable implementations? If all of the fields are Serializable and this is Serializable then the default implementation should just work.
There was a problem hiding this comment.
If we want to verify this, we can add a test for serializability, too.
There was a problem hiding this comment.
sounds good. I'll double check this.
There was a problem hiding this comment.
@rdblue, can you elaborate more on what is the default implementation here? I don't think there's a default implementation for Serializable in MR. There's a default impl. for Writables
There was a problem hiding this comment.
Java serialization will work without implementing write and readFields as long as all of the fields in an object are Serializable. The only reason to implement this is to wrap a non-serializable object or to customize in some way how serialization happens. Because all 3 fields that need to be serialized are already Serializable (FileScanTask, FileIO, and EncryptionManager) I don't think we need these methods.
There was a problem hiding this comment.
To follow up on this, I don't think that we need to remove it in this PR. But we should be able to remove these in a follow-up.
| }); | ||
| } | ||
|
|
||
| private FileIO fileIO(Table table) { |
There was a problem hiding this comment.
nit:seems like this method can be made static?
| out.writeInt(data.length); | ||
| out.write(data); | ||
|
|
||
| byte[] ioData = SerializationUtil.serializeToBytes(io); |
There was a problem hiding this comment.
@rdblue, can you elaborate more on what is the default implementation here? I don't think there's a default implementation for Serializable in MR. There's a default impl. for Writables
|
|
||
| private FileIO fileIO(Table table) { | ||
| if (table.io() instanceof HadoopFileIO) { | ||
| return new HadoopFileIO(((HadoopFileIO) table.io()).conf()); |
There was a problem hiding this comment.
Why does this create a new HadoopFileIO? The table's FileIO should work fine.
I think this might be based on the handling in Spark, which creates a new one that wraps a broadcasted manifest. But there is no broadcast for MR so I don't think we need this method. It should work fine to use table.io().
|
This seems close to completion @jerryshao . Can we get this in, after addressing those minor comments? |
|
This was fixed in #1532, so I'll close this one. Thanks for working on this! |
This address the left issue in #865
CC @rdsr @rdblue